Recognition of Noisy Speech: Using Minimum-Mean Log-Spectral Distance Estimation
نویسندگان
چکیده
A model-based spectral estimation algorithm is derived that improves the robustness of speech recognition systems to additive noise. The algorithm is tailored for filter-bank-based systems, where the estimation should seek to minimize the distortion as measured by the recognizer's distance metric. This estimation criterion is approximated by minimizing the Euclidean distance between spectral log-energy vectors, which is equivalent to minimizing the nonweighted, nontruncated cepstral distance. Correlations between frequency channels are incorporated in the estimation by modeling the spectral distribution of speech as a mixture of components, each representing a different speech class, and assuming that spectral energies at different frequency channels are uncorrelated within each class. The algorithm was tested with SRI's continuous-speech, speaker-independent, hidden Markov model recognition system using the largevocabulary NIST "Resource Management Task." When trained on a clean-speech database and tested with additive white Gaussian noise, the new algorithm has an error rate half of that with MMSE estimation of log spectral energies at individual frequency channels, and it achieves a level similar to that with the ideal condition of training and testing at constant SNR. The algorithm is also very efficient with additive environmental noise, recorded with a desktop microphone.
منابع مشابه
Time-Varying Noise Estimation for Speech Enhancement and Recognition Using Sequential Monte Carlo Method
We present a method for sequentially estimating time-varying noise parameters. Noise parameters are sequences of time-varying mean vectors representing the noise power in the log-spectral domain. The proposed sequential Monte Carlo method generates a set of particles in compliance with the prior distribution given by clean speech models. The noise parameters in this model evolve according to ra...
متن کاملDistributed multichannel speech enhancement with minimum mean-square error short-time spectral amplitude, log-spectral amplitude, and spectral phase estimation
In this paper, the authors present optimal multichannel frequency domain estimators for minimum mean-square error (MMSE) short-time spectral amplitude (STSA), log-spectral amplitude (LSA), and spectral phase estimation in a widely distributed microphone configuration. The estimators utilize Rayleigh and Gaussian statistical models for the speech prior and noise likelihood with a diffuse noise f...
متن کاملTwo-Microphone Noise Reduction Using Spatial Information-Based Spectral Amplitude Estimation
Traditional two-microphone noise reduction algorithms to deal with highly nonstationary directional noises generally use the direction of arrival or phase difference information. The performance of these algorithms deteriorate when diffuse noises coexist with nonstationary directional noises in realistic adverse environments. In this paper, we present a two-channel noise reduction algorithm usi...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملSpeech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering
Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...
متن کامل